IN THE CLAIMS: 

The following is a complete list of the claims now pending. This listing 
replaces all earlier versions and listings of the claims. 



Claim 1 (currently amended): Apparatus An/apparatus for processing image 
data and sound data, comprising: 

an image processor for processin^g image data recorded by at least 
one camera showing the movements of a plurality o^eople to track each person in three 
dimensions; 

a sound processor for processing sound data to determine the 
direction of arrival of the sound; 

a speaker identifier fdr determining which of the people is speaking 
based on the result of the processing performed by [[the]] said image processor and the 
result of the processing performed by^/the]] said sound processor; and 

a voice recognition processor for processing the received sound data 
to generate text data therefrom in dependence upon the result of the processing performed 
by [[the]] said speaker identifiery 

Claim 2 (currently amended): A pp a r atus An apparatus according to claim 1, 
wherein [[the]] said voice recognition processor includes a [[store]] storage unit for storing 
respective voice recognitpn parameters for each of the plurality of people, and a selection 
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processor for selecting the voice recognition parameters to be us'ed to process the sound 
data in dependence upon the person determined to be speaking by the speaker identifier. 



Claim 3 (currently amended): Apparatus An apparatus according to claim 1, 
wherein [[the]] said image processor is arranged to^ack each person by processing the 
image data using camera calibration data defining the position and orientation of each 
camera from which image data is processed 

Claim 4 (currently amended): Appai ' atus An apparatus according to claim 1, 
wherein [[the]] said image processor is Ranged to track each person by tracking each 
person's head. 

Claim 5 (currently amended): Appai - atus An apparatus according to claim 1, 
wherein [[the]] said image processor is arranged to process the image data to determine 
where at least each person wlZ is speaking is looking. 

Claim 6 (currently amended): A p paratus An apparatus according to claim 1, 
wherein [[the]] said speaker identifier is arranged to identify a person who is speaking in a 
given firame of the received image data using the results of the processing performed by 
[[the]] said image/processor and [[the]] said sound processor for at least one other fi-ame if 
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the speaker cannot be identified using the results of the proces^ng performed by [[the]] 
said image processor and [[the]] said sound processor for the given frame. 



Claim 7 (currently amended): Appai ' atus An apparatus according to claim 1, 
further comprising a database for storing at least som/of the received image data, the 
sound data, the text data produced by [[the]] said v^ce recognition processor and viewing 
data defining where at least each person who is speaking is looking, [[the]] said database 
being arranged to store the data such that corre^onding text data and viewing data are 
associated with each other and with the corresponding image data and soimd data. 



Claim 8 (currently amended): Apparatus An apparatus according to claim 7, 
further comprising a data compressor for compressing the image data and the sound data 
for storage in [[the]] said database. 

Claim 9 (currently/amended): Ap p a r atus An apparatus according to claim 8, 
wherein [[the]] said data compressor comprises a data encoder for encoding the image data 
and the sound data as MPEG aata. 



Claim 10 (durrently amended): A p pa r atus An apparatus according to claim 
7, further comprising a gaze data generator for generating data defining, for a 
predetermined period, the proportion of time spent by a given person looking at each of the 
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other people during the predetermined period, and wherein [[the]] said database is arranged 
to store the data so that it is associated with the corresponding image data, sound data, text 
data and viewing data. 



Claim 1 1 (currently amended): Apparatus An apparatus according to claim 
10, wherein the predetermined period comprise/ a period during which the given person 
was talking. 

Claim 12 (currently amended): Apparatus An apparatus for processing 
image data and soxmd data, comprising: / 

an image processor for processing image data recorded by at least 
one camera showing the movements of a plurality of people to track each person in three 



dimensions; 

a sound pro^fessor for processing soxmd data to determine the 
direction of arrival of the sound; and 

a speakei/identifier for determining which of the people is speaking 
based on the result of the processing performed by [[the]] said image processor and the 
result of the processing performed by [[the]] said sound processor. 

Claim 13/(currently amended): Ap p aratus An apparatus according to claim 
12, wherein [[the]] saiq image processor is arranged to track each person by processing the 



image data using camera calibration data defining the pgsition and orientation of each 
camera from which image data is processed. 



Claim 14 (currently amended): Apriai - atus An apparatus according to claim 
12, wherein [[the]] said image processor is arranged to track each person by tracking each 
person's head. 

Claim 15 (currently amended): Apparatus An apparatus according to claim 
12, wherein [[the]] said image processor is arranged to process the image data to determine 
where at least each person who is speaking is looking. 

Claim 16 (currently amended): Apparatus An apparatus according to claim 
12, wherein [[the]] said speaker identifier is arranged to identify a person who is speaking 
in a given frame of the received iinage data using the results of the processing performed by 
[[the]] said image processor and [[the]] said sound processor for at least one other frame if 
the speaker cannot be identified using the results of the processing performed by [[the]] 
said image processor and [pthe]] said sound processor for the given frame. 

Claim \ f (currently amended): A method of processing image data and 
sound data, comprising 
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an image processing step, of comprising processing image data 
recorded by at least one camera showing the movements of a plurality of people to track 
each person in three dimensions; 

a sound processing step, ofy fcomprising processing sound data to 
determine the direction of arrival of the sound; 

a speaker identification step, of comprising determining which of the 
people is speaking based on the result of the processing performed in [[the]] said image 
processing step and the result of the processing performed in [[the]] said sound processing 
step; and 

a voice recognitiofi processing step, of comprising processing the 
received sound data to generate text da/a therefrom in dependence upon the result of the 
processing performed in [[the]] said speaker identification step. 



Claim 18 (currently/amended): A method according to claim 17, wherein, 
[[the]] said voice recognition processing step includes selecting, from stored respective 



voice recognition parameters for each of the plurality of people, [[the]] voice recognition 
parameters to be used to process the sound data in dependence upon the person determined 
to be speaking in [[the]] said /speaker identification step. 



Claim 19 (currently amended): A method according to claim 17, wherein, 
in th e said image processing step , each person is tracked includes tracking each person by 



- / 

processing the image data using camera calibration data defining the position and 

/ 

orientation of each camera from which image data is processed. 



Claim 20 (currently amended)^A method according to claim 17, wherein, 
in the said image processing step , each p e rson is tracked includes tracking each person by 
tracking the person's head. 



Claim 21 (currently amended): A method according to claim 17, wherein, 
irrthc said image processing stepr image data is pr ocessed includes processing the 
image data to determine where it least each person who is speaking is looking. 

Claim 22 f .currently amended): A method according to claim 17, wherein, 
in the said speaker idoitification step[[,]] includes identifying a person who is speaking in a 
given frame of the received image data is identified using the results of the processing 
performed in [[the]] said image processing step and [[the]] said sound processing step for at 
least one other firame if the speaker cannot be identified using the results of the processing 
performed in/[the]] said image processing step and [[the]] said sound processing step for 
the given frame. 
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Claim 23 (currently amended): A method ^cording to claim 17, further 
comprising [[the]] a signal generating step^ of generating a signal conveying the text data 
generated in [[the]] said voice recognition processing^tep. 

Claim 24 (currently amended): A method according to claim 17, further 
comprising [[the]] a received image data storage step^ of storing in a database at least some 
of the received image data, the sound data, the text data produced in [[the]] said voice 
recognition processing step and viewing data defining where at least each person who is 
speaking is looking, the data bein^stored in the database such that corresponding text data 
and viewing data are associated/with each other and with the corresponding image data and 
sound data. 



Claim 25 (original): A method according to claim 24, wherein the image 
data and the sound data are stored in the database in compressed form. 



/claim 26 (original): A method according to claim 25, wherein the image 
data and th^ sound data are stored as MPEG data. 

Claim 27 (currently amended): A method according to claim 24, further 
comprismg: 



-9- 



/ 

the ste p s a data defining generation step, of generating data defining, 
for a predetermined period, the proportion of time spent hya given person looking at each 
of the other people during the predetermined period[[,l^ and 

a data storage step, of storing/the data in the database so that it is 
associated with the corresponding image data, sound data, text data and viewing data. 



Claim 28 (original): A meti^od according to claim 27, wherein the 
predetermined period comprises a period ^during which the given person was talking. 



Claim 29 (currently apiended): A method according to claim 24, fixrther 
comprising [[the]] a generating st^p^ of generating a signal conveying the database with 
data therein. 

Claim 30 (ciurently amended): A method according to claim 29, further 
comprising [[the]] a recording step^ of recording the signal either directly or indirectly to 
generate a recording thei? 

Clairn 3 1 (currently amended): A method of processing image data and 
sound data, comprising: 
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/ 

an image processing step, of comp r ising processing image data 
recorded by at least one camera showing the movements/of a plurality of people to track 
each person in three dimensions; / 

a sound processing step, of ^pmprising processing sound data to 
determine the direction of arrival of the sound; and 

a speaker identificatioystep, of comprising determining which of the 
people is speaking based on the result of th/ processing performed in [[the]] said image 
processing step and the result of the processing performed in [[the]] said sound processing 
step. / 

Claim 32 (currently amended): A method according to claim 31, wherein; 
in th e said image processing step , each pe r son is tracked includes tracking each person by 
processing the image data using camera calibration data defining the position and 
orientation of each camera from which image data is processed. 

Claim 3d (currently amended): A method according to claim 31, wherein; 
in the said image processing step , each person is tracked by includes tracking each person 
by tracking the person's head. 
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Claim 34 (currently amended): A method/according to claim 31, wherein; 
nrrtte said image processing step , the imag e data is pr ocessed includes processing the 
image data t o determine where at least each person wno is speaking is looking. 

Claim 35 (currently amended): A method according to claim 31, wherein; 
in the speaker identification step[[,]] includes identifying a person who is speaking in a 
given frame of the received image data is id e ntified using the results of the processing 
performed in [[the]] said image processing step and [[the]] said sound processing step for at 
least one other frame if the speaker cannot be identified using the results of the processing 
performed in [[the]] said image processing/step and [[the]] said sound processing step for 
the given frame. 

Claim 36 (currently amended): A method according to claim 31, further 
comprising the step of generating a signal conveying the identity of the speaker identified 
in [[the]] said speaker identification step. 

Claim 37 (currently amended): A storage device storing computer program 
instructions for programming causing a programmable processing apparatus to become 
configured as an apparatus asf set out in at least any one of claims 1^ [[and]] 12 . 87 and 88 . 
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Claim 38 (currently amended): A storage device storing computer program 
instructions for programming causing a programmable orocessing apparatus to become 
operable to perform a method as set out in aHcast any/one of claims 17^ [[and]] 31 , 89 and 
90. 

Claim 39 (currently amended): A signal conveying computer program 
instructions for programming causing a progyanimable processing apparatus to become 
configured as an apparatus as set out in at+east any one of claims U [[and]] 12 , 87 and 88 . 

Claim 40 (currently ammided): A signal conveying computer program 
instructions for programming causing a programmable processing apparatus to become 
operable to perform a method as se/ out in at least any one of claims 17^ [[and]] 3 L 89 and 
90. 



Claim 41 (currently amended): Appai - atus An apparatus for processing 
image data and sound data^y/omprising: 

image processing means for processing image data recorded by at 
least one camera showing the movements of a plurality of people to track each person in 
three dimensions; 

sound processing means for processing soimd data to determine the 
direction of arrival/of the sound; 
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speaker identification means for determining which of the people is 
speaking based on the resuh of the processing performed by /[the]] said image processing 
means and the result of the processing performed by [[thel/f said sound processing means; 
and / 

voice recognition processing nipans for processing the received 
sound data to generate text data therefrom in dependence upon the result of the processing 
performed by [[the]] said speaker identification nieans. 

Claim 42 (currently amended):/ Apparatus An apparatus for processing 
image data and sound data, comprising: / 

image processing means for processing image data recorded by at 
least one camera showing the movements of a plurality of people to track each person in 
three dimensions; / 

sound processing means for processing sound data to determine the 
direction of arrival of the sound; and 

speaker identification means for determining which of the people is 
speaking based on the result omhe processing performed by [[the]] said image processing 
means and the result of the p/ocessing performed by [[the]] said sound processing means. 

Claims 43^85 (canceled) 
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Claim 87 (new): An apparatus for processi^'g image data and sound data, 

comprising: 

an image processor operable to process image data recorded by at 
least one camera showing the movements of a plurality of people to track each person; 

a soimd processor operable t^ process sound data to determine the 
direction of arrival of the soimd; and 

a speaker identifier operable to determine which of the people is 
speaking based on the result of the processing ^/rformed by said image processor and the 
result of the processing performed by said sound processor. 



Claim 88 (new): An apparatus for processing image data and sound data, 

comprising: 

an image processor operable to process image data recorded by at 
least one camera showing the movements of a plurality of people in three dimensions; 

a sound procesi/or operable to process sound data to determine the 
direction of arrival of the sound; anc 

a speaker identifier operable to determine which of the people is 
speaking based on the result of thef processing performed by said image processor and the 
result of the processing performed by said sound processor. 
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Claim 89 (new): A method of processing image data and sound data, 

comprising: 

an image processing step, of processing image data recorded by at 
least one camera showing the movements of a plurality of people to track each person; 

a sound processing step, of processing sound data to determine the 
direction of arrival of the sound; and 

a speaker identification step, of determining which of the people is 
speaking based on the result of the processing performed in said image processing step and 
the result of the processing performed in said sound processing step. 

Claim 90 (new): A method of processing image data and soimd data, 

comprising: 

an image processing step, of processing image data recorded by at 
least one camera showing the movements of a plurality of people in three dimensions; 

a sound processing step, of processing sound data to determine the 
direction of arrival of the sound; and 

a speaker identification step, of determining which of the people is 
speaking based on the result of the processing performed in said image processing step and 
the result of the processing performed in said sound processing step. 
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Claim 91 (new): An apparatus for processing image data and sound data, 

comprising: 

image processing means for processing image data recorded by at 
least one camera showing the movements of a plurality of people to track each person; 

sound processing means for processing sound data to determine the 
direction of arrival of the sound; and 

speaker identification means for determining which of the people is 
speaking based on the result of the processing performed by said image processing means 
and the result of the processing performed by said sound processing means. 

Claim 92 (new): An apparatus for processing image data and sound data, 

comprising: 

image processing means for processing image data recorded by at 
least one camera showing the movements of a plurality of people in three dimensions; 

sound processing means for processing sound data to determine the 
direction of arrival of the sound; and 

speaker identification means for determining which of the people is 
speaking based on the result of the processing performed by said image processing means 
and the result of the processing performed by said sound processing means. 
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